预先训练的大语言模型(LLM)(例如OpenAI Codex)通过从非正式自然语言(NL)意图中生成自然代码来自动化编码的重要方面。但是,生成的代码无法满足用户意图的任何正确性保证。实际上,很难定义正确性的概念,因为自然语言可能是模棱两可的,并且缺乏正式的语义。在本文中,我们通过提出测试驱动的用户形式化(TDUIF)的工作流程来解决以上问题的第一步,该工作流利用轻量级用户的反馈共同将用户的意图正式化为测试(部分规范) ),(b)生成符合正式用户意图的代码。要对算法进行可扩展的大规模自动化评估,而无需循环中的用户,我们描述了如何使用参考解决方案模拟用户与高保真性的互动。我们还描述并实施了几种算法组件(包括突变和排名一组测试)的替代实现,这些实现可用于有效解决TDUIF问题。我们已经开发了一个系统的Ticoder,该系统实现了多种解决方案来进行TDUIF,并将其对MBPP学术代码生成基准测试的相对有效性进行了比较。在MBPP上使用OpenAI Codex LLM的结果很有希望:我们的最佳算法将通行证@1代码生成准确度指标从48.39%提高到单个用户查询,最高为85.48%,最多可达55.48%,最多可提供5个用户查询。其次,我们可以生成与用户意图在1.69个用户查询中的非平凡功能单位测试,该数据集为90.40%的示例,用于此数据集。
translated by 谷歌翻译
Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the program execution on a small number of known unit tests to select one candidate solution. However, these approaches assume that the unit tests are given and assume the ability to safely execute the generated programs (which can do arbitrary dangerous operations such as file manipulations). Both of the above assumptions are impractical in real-world software development. In this paper, we propose CodeRanker, a neural ranker that can predict the correctness of a sampled program without executing it. Our CodeRanker is fault-aware i.e., it is trained to predict different kinds of execution information such as predicting the exact compile/runtime error type (e.g., an IndexError or a TypeError). We show that CodeRanker can significantly increase the pass@1 accuracy of various code generation models (including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.
translated by 谷歌翻译
强化学习的关键挑战是解决了长地平规划问题。最近的工作已经利用计划在这些设置中引导钢筋学习。但是,这些方法对用户施加了高手动负担,因为它们必须为每项新任务提供指导计划。部分观察到的环境进一步使编程任务复杂化,因为程序必须实现正确,理想地最佳地实现策略,处理环境的隐藏区域的所有可能配置。我们提出了一种新的方法,模型预测程序合成(MPP),它使用程序综合来自动生成指导程序。它培训了一种生成模型来预测世界的未观察到的部分,然后以鲁棒到其不确定性的方式基于来自该模型的样本来综合程序。在我们的实验中,我们表明我们的方法在一组具有挑战性的基准上显着优于非程序引导的方法,包括2D Minecraft-Inspired环境,代理商必须完成复杂的子组织序列来实现其目标,并实现类似的使用手动程序指导代理的性能。我们的结果表明,我们的方法可以在不需要用户为每项新任务提供新的指导计划的情况下获得方案引导的强化学习的好处。
translated by 谷歌翻译
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
translated by 谷歌翻译
6D object pose estimation has been a research topic in the field of computer vision and robotics. Many modern world applications like robot grasping, manipulation, autonomous navigation etc, require the correct pose of objects present in a scene to perform their specific task. It becomes even harder when the objects are placed in a cluttered scene and the level of occlusion is high. Prior works have tried to overcome this problem but could not achieve accuracy that can be considered reliable in real-world applications. In this paper, we present an architecture that, unlike prior work, is context-aware. It utilizes the context information available to us about the objects. Our proposed architecture treats the objects separately according to their types i.e; symmetric and non-symmetric. A deeper estimator and refiner network pair is used for non-symmetric objects as compared to symmetric due to their intrinsic differences. Our experiments show an enhancement in the accuracy of about 3.2% over the LineMOD dataset, which is considered a benchmark for pose estimation in the occluded and cluttered scenes, against the prior state-of-the-art DenseFusion. Our results also show that the inference time we got is sufficient for real-time usage.
translated by 谷歌翻译
Machine learning (ML) is revolutionizing protein structural analysis, including an important subproblem of predicting protein residue contact maps, i.e., which amino-acid residues are in close spatial proximity given the amino-acid sequence of a protein. Despite recent progresses in ML-based protein contact prediction, predicting contacts with a wide range of distances (commonly classified into short-, medium- and long-range contacts) remains a challenge. Here, we propose a multiscale graph neural network (GNN) based approach taking a cue from multiscale physics simulations, in which a standard pipeline involving a recurrent neural network (RNN) is augmented with three GNNs to refine predictive capability for short-, medium- and long-range residue contacts, respectively. Test results on the ProteinNet dataset show improved accuracy for contacts of all ranges using the proposed multiscale RNN+GNN approach over the conventional approach, including the most challenging case of long-range contact prediction.
translated by 谷歌翻译
Acquiring food items with a fork poses an immense challenge to a robot-assisted feeding system, due to the wide range of material properties and visual appearances present across food groups. Deformable foods necessitate different skewering strategies than firm ones, but inferring such characteristics for several previously unseen items on a plate remains nontrivial. Our key insight is to leverage visual and haptic observations during interaction with an item to rapidly and reactively plan skewering motions. We learn a generalizable, multimodal representation for a food item from raw sensory inputs which informs the optimal skewering strategy. Given this representation, we propose a zero-shot framework to sense visuo-haptic properties of a previously unseen item and reactively skewer it, all within a single interaction. Real-robot experiments with foods of varying levels of visual and textural diversity demonstrate that our multimodal policy outperforms baselines which do not exploit both visual and haptic cues or do not reactively plan. Across 6 plates of different food items, our proposed framework achieves 71% success over 69 skewering attempts total. Supplementary material, datasets, code, and videos are available on our website: https://sites.google.com/view/hapticvisualnet-corl22/home
translated by 谷歌翻译
离群值检测是一项具有挑战性的活动。文献中提出了几种机器学习技术,以进行异常检测。在本文中,我们为双向gan(Bigan)提出了一种新的培训方法,以检测异常值。为了验证拟议的方法,我们采用拟议的培训方法来培训一个Bigan,以检测正在操纵其纳税申报表的纳税人。对于每个纳税人,我们从他/她提交的纳税申报表中得出六个相关参数和三个比率参数。我们在这九个派生的地面数据集上采用拟议的培训方法来训练Bigan。接下来,我们使用$ encoder $(使用$ encoder $编码此数据集)生成此数据集的潜在表示,并使用$ Generator $(使用$ Generator $解码)再生此数据集,通过提供此潜在表示为输入。对于每个纳税人,计算其基地数据和再生数据之间的余弦相似性。具有较低余弦相似性措施的纳税人是潜在的回程操纵者。我们应用了我们的方法来分析印度特兰加纳政府商业税务部提供的钢铁纳税人数据集。
translated by 谷歌翻译
循环贸易是商品和服务税的逃税形式,其中一组欺诈性纳税人(交易者)的目标是通过在短期内将几项虚拟交易(在商品或服务中添加价值不高)来掩盖非法交易,以掩盖非法交易。。由于纳税人的庞大数据库,当局可以手动识别循环交易者和他们所涉及的非法交易的群体是不可行的。这项工作使用大数据分析和图形表示技术来提出一个框架来识别循环交易者社区并隔离各个社区的非法交易。我们的方法经过印度特兰加纳政府商业税部提供的现实生活数据,在那里我们发现了几个循环商人社区。
translated by 谷歌翻译
由于存在浓烟或阴霾,从室外视觉环境收集的图像通常会降解。在这些退化的视觉环境(DVE)中,在场景理解中进行研究的关键挑战是缺乏代表性的基准数据集。这些数据集需要评估降级设置中的最新对象识别和其他计算机视觉算法。在本文中,我们通过引入带有朦胧和无雾图像的第一个配对的真实图像基准数据集以及原位的雾化密度测量来解决其中的一些限制。该数据集是在受控的环境中生产的,其专业烟雾产生机器覆盖了整个场景,并由从无人机(UAV)(UAV)和无人接地车(UGV)的角度捕获的图像组成。我们还评估了一组代表性的最先进的飞行方法以及数据集中的对象探测器。本文介绍的完整数据集,包括地面真相对象分类框和雾密度测量值,为社区提供了以下网址评估其算法的信息:https://a2i2-archangel.vision。该数据集的一个子集已用于在CVPR UG2 2022挑战的雾痕中进行对象检测。
translated by 谷歌翻译